Online optimization for variable selection in data streams

نویسندگان

  • Christoforos Anagnostopoulos
  • Dimitris K. Tasoulis
  • David J. Hand
  • Niall M. Adams
چکیده

Variable selection for regression is a classical statistical problem, motivated by concerns that too many covariates invite overfitting. Existing approaches notably include a class of convex optimisation techniques, such as the Lasso algorithm. Such techniques are invariably reliant on assumptions that are unrealistic in streaming contexts, namely that the data is available off-line and the correlation structure is static. In this paper, we relax both these constraints, proposing for the first time an online implementation of the Lasso algorithm with exponential forgetting. We also optimise the model dimension and the speed of forgetting in an online manner, resulting in a fully automatic scheme. In simulations our scheme improves on recursive least squares in dynamic environments, while also featuring model discovery and changepoint detection capabilities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of genetic algorithm (GA) to select input variables in support vector machine (SVM) for analyzing the occurrence of roach, Rutilus rutilus, in streams

Support vector machine (SVM) was used to analyze the occurrence of roach in Flemish stream basins (Belgium). Several habitat and physico?chemical variables were used as inputs for the model development. The biotic variable merely consisted of abundance data which was used for predicting presence/absence of roach. Genetic algorithm (GA) was combined with SVM in order to select the most important...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

Sparse partial least squares for on-line variable selection in multivariate data streams

In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition ...

متن کامل

Predictive modeling with high-dimensional data streams: an on-line variable selection approach

In this paper we propose a computationally efficient algorithm for on-line variable selection in multivariate regression problems involving high dimensional data streams. The algorithm recursively extracts all the latent factors of a partial least squares solution and selects the most important variables for each factor. This is achieved by means of only one sparse singular value decomposition ...

متن کامل

A systematic review on supplier selection and order allocation problems

The supplier selection and order allocation are two key strategic decisions in purchasing problem. The review presented in this paper focuses on the supplier selection problems (SSP) and order allocation from year 2000 to 2017 in which a new structure and classification of the existing research streams and the different MCDM methods and mathematical models used for SSP will be presented. The re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008